{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {
    "datalore": {
     "hide_input_from_viewers": false,
     "hide_output_from_viewers": false,
     "type": "MD"
    }
   },
   "source": [
    "### Notebook to demonstrate TAO-Remote Client AutoML workflow for Object Detection\n",
    "\n",
    "Transfer learning is the process of transferring learned features from one application to another. It is a commonly used training technique where you use a model trained on one task and re-train to use it on a different task. Train Adapt Optimize (TAO) Toolkit  is a simple and easy-to-use Python based AI toolkit for taking purpose-built AI models and customizing them with users' own data.\n",
    "\n",
    "![image](https://d29g4g2dyqv443.cloudfront.net/sites/default/files/akamai/TAO/tlt-tao-toolkit-bring-your-own-model-diagram.png)\n",
    "\n",
    "\n",
    "\n",
    "### Learning Objective\n",
    "\n",
    "This AutoML notebook applies to identifying the optimal hyperparameters (e.g., learning rate, batch size, weight regularizer, number of layers, etc.) in order to obtain better accuracy results or converge faster on AI models for object detection application.\n",
    "- Take a pretrained model and choose automl algorithm/parameters to start AutoML train.\n",
    "- At the end of an AutoML run, you will receive a config file that specifies the best performing model, along with the binary model file to deploy it to your application.\n",
    "\n",
    "\n",
    "### The workflow in a nutshell\n",
    "\n",
    "- Creating train and eval dataset\n",
    "- Upload datasets to the service\n",
    "- Running dataset convert\n",
    "- Set AutoML algorithm configurations\n",
    "  - Add/Remove AutoML parameters\n",
    "- Override train config defaults\n",
    "- Run AutoML\n",
    "\n",
    "\n",
    "### AutoML Workflow\n",
    "\n",
    "User starts with selecting model topology, create and upload dataset, configuring parameters, training with AutoML to comparing the model.\n",
    "\n",
    "![image](https://raw.githubusercontent.com/vpraveen-nv/model_card_images/main/api/automl_workflow.png)\n",
    "\n",
    "### Table of contents\n",
    "\n",
    "1. [Install TAO remote client](#head-1)\n",
    "1. [Set the remote service base URL](#head-2)\n",
    "1. [Access the shared volume](#head-3)\n",
    "1. [Create the datasets](#head-4)\n",
    "1. [List datasets](#head-5)\n",
    "1. [Provide and customize dataset convert specs](#head-6)\n",
    "1. [Run dataset convert](#head-7)\n",
    "1. [Create a model experiment ](#head-8)\n",
    "1. [Find pretrained model](#head-9)\n",
    "1. [Set AutoML related configurations](#head-10)\n",
    "1. [Provide train specs](#head-11)\n",
    "1. [Run AutoML train](#head-12)\n",
    "1. [Get the best model from AutoML](#head-13)\n",
    "1. [Delete experiment](#head-14)\n",
    "1. [Delete datasets](#head-15)\n",
    "1. [Unmount shared volume](#head-16)\n",
    "\n",
    "### Requirements\n",
    "Please find the server requirements [here](https://docs.nvidia.com/tao/tao-toolkit/text/tao_toolkit_api/api_setup.html#)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "datalore": {
     "hide_input_from_viewers": false,
     "hide_output_from_viewers": false,
     "type": "CODE"
    },
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "import os\n",
    "import glob\n",
    "import subprocess\n",
    "import getpass\n",
    "import uuid\n",
    "import json"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "namespace = 'default'"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### FIXME\n",
    "\n",
    "1. Assign a model_name in FIXME 1\n",
    "2. Choose between default or custom dataset in FIXME 2\n",
    "3. Assign the ip_address and port_number in FIXME 3 and FIXME 4 ([info](https://docs.nvidia.com/tao/tao-toolkit/text/tao_toolkit_api/api_rest_api.html))\n",
    "4. Set NGC API key in FIXME 5\n",
    "5. Assign path of data directory in FIXME 6\n",
    "7. Choose between bayesian and hyperband automl_algorithm in FIXME 7"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "# Available models (#FIXME 1):\n",
    "# 1. detectnet-v2 - https://docs.nvidia.com/tao/tao-toolkit/text/object_detection/detectnet_v2.html\n",
    "# 2. efficientdet - https://docs.nvidia.com/tao/tao-toolkit/text/object_detection/efficientdet.html\n",
    "# 3. faster-rcnn - https://docs.nvidia.com/tao/tao-toolkit/text/object_detection/fasterrcnn.html\n",
    "# 4. retinanet - https://docs.nvidia.com/tao/tao-toolkit/text/object_detection/retinanet.html\n",
    "# 5. ssd - https://docs.nvidia.com/tao/tao-toolkit/text/object_detection/ssd.html\n",
    "# 6. yolo-v3 - https://docs.nvidia.com/tao/tao-toolkit/text/object_detection/yolo_v3.html\n",
    "# 7. yolo-v4 - https://docs.nvidia.com/tao/tao-toolkit/text/object_detection/yolo_v4.html\n",
    "# 8. yolo-v4-tiny - https://docs.nvidia.com/tao/tao-toolkit/text/object_detection/yolo_v4_tiny.html\n",
    "\n",
    "model_name = \"yolo-v4\" # FIXME1 (Add the model name from the above mentioned list)\n",
    "dataset_to_be_used = \"default\" # FIXME2 example: default/custom; default for the dataset used in this tutorial notebook; custom for a different dataset"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "datalore": {
     "hide_input_from_viewers": false,
     "hide_output_from_viewers": false,
     "type": "MD"
    }
   },
   "source": [
    "### Install TAO remote client <a class=\"anchor\" id=\"head-1\"></a>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "datalore": {
     "hide_input_from_viewers": false,
     "hide_output_from_viewers": false,
     "type": "CODE"
    },
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "# SKIP this step IF you have already installed the TAO-Client wheel.\n",
    "! pip3 install nvidia-transfer-learning-client"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "# View the version of the TAO-Client\n",
    "! nvtl --version"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "datalore": {
     "hide_input_from_viewers": false,
     "hide_output_from_viewers": false,
     "type": "MD"
    }
   },
   "source": [
    "### Set the remote service base URL <a class=\"anchor\" id=\"head-2\"></a>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "# Define the node_addr and port number\n",
    "node_addr = \"<ip_address>\" # FIXME3 example: 10.137.149.22\n",
    "node_port = \"<port_number>\" # FIXME4 example: 32334\n",
    "# In host machine, node ip_address and port number can be obtained as follows,\n",
    "# ip_address: hostname -i\n",
    "# port_number: kubectl get service ingress-nginx-controller -o jsonpath='{.spec.ports[0].nodePort}'\n",
    "%env BASE_URL=http://{node_addr}:{node_port}/{namespace}/api/v1"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "datalore": {
     "hide_input_from_viewers": false,
     "hide_output_from_viewers": false,
     "type": "CODE"
    },
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "# FIXME: Set ngc_api_key valiable\n",
    "ngc_api_key = \"<ngc_api_key>\" # FIXME5 example: zZYtczM5amdtdDcwNjk0cnA2bGU2bXQ3bnQ6NmQ4NjNhMDItMTdmZS00Y2QxLWI2ZjktNmE5M2YxZTc0OGyM\n",
    "\n",
    "# Exchange NGC_API_KEY for JWT\n",
    "identity = json.loads(subprocess.getoutput(f'nvtl login --ngc-api-key {ngc_api_key}'))\n",
    "\n",
    "%env USER={identity['user_id']}\n",
    "%env TOKEN={identity['token']}"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "datalore": {
     "hide_input_from_viewers": false,
     "hide_output_from_viewers": false,
     "type": "MD"
    }
   },
   "source": [
    "### Access the shared volume <a class=\"anchor\" id=\"head-3\"></a>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "# Get PVC ID\n",
    "pvc_id = subprocess.getoutput(f'kubectl get pvc tao-toolkit-api-pvc -n {namespace} -o jsonpath=\"{{.spec.volumeName}}\"')\n",
    "print(pvc_id)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "# Get NFS server info\n",
    "provisioner = json.loads(subprocess.getoutput(f'helm get values nfs-subdir-external-provisioner -o json'))\n",
    "nfs_server = provisioner['nfs']['server']\n",
    "nfs_path = provisioner['nfs']['path']\n",
    "print(nfs_server, nfs_path)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "user = getpass.getuser()\n",
    "home = os.path.expanduser('~')\n",
    "\n",
    "! echo \"Password for {user}\"\n",
    "password = getpass.getpass()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "# Mount shared volume \n",
    "! mkdir -p ~/shared\n",
    "\n",
    "command = \"apt-get -y install nfs-common >> /dev/null\"\n",
    "! echo {password} | sudo -S -k {command}\n",
    "\n",
    "command = f\"mount -t nfs {nfs_server}:{nfs_path}/{namespace}-tao-toolkit-api-pvc-{pvc_id} ~/shared\"\n",
    "! echo {password} | sudo -S -k {command} && echo DONE"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "datalore": {
     "hide_input_from_viewers": false,
     "hide_output_from_viewers": false,
     "type": "MD"
    }
   },
   "source": [
    "### Create the datasets <a class=\"anchor\" id=\"head-4\"></a>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "datalore": {
     "hide_input_from_viewers": false,
     "hide_output_from_viewers": false,
     "type": "MD"
    }
   },
   "source": [
    "We will be using NVIDIA's synthetic dataset on warehouse images based on the `kitti object detection dataset` format in this example. To find more details about kitti, please visit [here](http://www.cvlibs.net/datasets/kitti/eval_object.php?obj_benchmark=2d)."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**If using custom dataset; it should follow this dataset structure**\n",
    "```\n",
    "DATA_DIR/train\n",
    "├── images/\n",
    "│   ├── image_name_1.jpg\n",
    "│   ├── image_name_2.jpg\n",
    "|   ├── ...\n",
    "└── labels\n",
    "    ├── image_name_1.txt\n",
    "    ├── image_name_2.txt\n",
    "    ├── ...\n",
    "\n",
    "DATA_DIR/val\n",
    "├── images\n",
    "│   ├── image_name_1.jpg\n",
    "│   ├── image_name_2.jpg\n",
    "|   ├── ...\n",
    "└── labels\n",
    "    ├── image_name_1.txt\n",
    "    ├── image_name_2.txt\n",
    "    ├── ...\n",
    "```\n",
    "The file name should be same for images and labels folders"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "DATA_DIR = model_name # FIXME6\n",
    "os.environ['DATA_DIR']= DATA_DIR\n",
    "!mkdir -p $DATA_DIR"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "if dataset_to_be_used == \"default\":\n",
    "    !aws s3 cp s3://tao-detection-synthetic-dataset-dev/tao_od_synthetic_train.tar.gz $DATA_DIR/\n",
    "    !aws s3 cp s3://tao-detection-synthetic-dataset-dev/tao_od_synthetic_val.tar.gz $DATA_DIR/\n",
    "    !tar -xzf {DATA_DIR}/tao_od_synthetic_train.tar.gz -C {DATA_DIR}/\n",
    "    !tar -xzf {DATA_DIR}/tao_od_synthetic_val.tar.gz -C {DATA_DIR}/"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "if model_name == \"efficientdet\":\n",
    "    ds_format = \"coco\"\n",
    "else:\n",
    "    ds_format = \"kitti\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "datalore": {
     "hide_input_from_viewers": false,
     "hide_output_from_viewers": false,
     "type": "CODE"
    },
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "train_dataset_id = subprocess.getoutput(f\"nvtl {model_name} dataset-create --dataset_type object_detection --dataset_format {ds_format}\")\n",
    "print(train_dataset_id)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "TRAIN_DATA_DIR = f\"{DATA_DIR}/train\"\n",
    "if model_name == \"efficientdet\":\n",
    "    import subprocess\n",
    "    !python3 -m pip install ujson\n",
    "    num_classes = subprocess.getoutput(f'python3 ../dataset_prepare/kitti/kitti_to_coco.py {TRAIN_DATA_DIR}/labels {TRAIN_DATA_DIR}')\n",
    "    ! rsync -ah --info=progress2 {TRAIN_DATA_DIR}/images ~/shared/users/{os.environ['USER']}/datasets/{train_dataset_id}/\n",
    "    ! rsync -ah --info=progress2 {TRAIN_DATA_DIR}/annotations.json ~/shared/users/{os.environ['USER']}/datasets/{train_dataset_id}/annotations.json\n",
    "else:\n",
    "    ! rsync -ah --info=progress2 {TRAIN_DATA_DIR}/images ~/shared/users/{os.environ['USER']}/datasets/{train_dataset_id}/\n",
    "    ! rsync -ah --info=progress2 {TRAIN_DATA_DIR}/labels ~/shared/users/{os.environ['USER']}/datasets/{train_dataset_id}/\n",
    "! echo DONE"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "datalore": {
     "hide_input_from_viewers": false,
     "hide_output_from_viewers": false,
     "type": "CODE"
    },
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "eval_dataset_id = subprocess.getoutput(f\"nvtl {model_name} dataset-create --dataset_type object_detection --dataset_format {ds_format}\")\n",
    "print(eval_dataset_id)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "VAL_DATA_DIR = f\"{DATA_DIR}/val\"\n",
    "if model_name == \"efficientdet\":\n",
    "    subprocess.getoutput(f'python3 ../dataset_prepare/kitti/kitti_to_coco.py {VAL_DATA_DIR}/labels {VAL_DATA_DIR}')\n",
    "    ! rsync -ah --info=progress2 {VAL_DATA_DIR}/images ~/shared/users/{os.environ['USER']}/datasets/{eval_dataset_id}/\n",
    "    ! rsync -ah --info=progress2 {VAL_DATA_DIR}/annotations.json ~/shared/users/{os.environ['USER']}/datasets/{eval_dataset_id}/annotations.json\n",
    "else:\n",
    "    ! rsync -ah --info=progress2 {VAL_DATA_DIR}/images ~/shared/users/{os.environ['USER']}/datasets/{eval_dataset_id}/\n",
    "    ! rsync -ah --info=progress2 {VAL_DATA_DIR}/labels ~/shared/users/{os.environ['USER']}/datasets/{eval_dataset_id}/\n",
    "! echo DONE"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "datalore": {
     "hide_input_from_viewers": false,
     "hide_output_from_viewers": false,
     "type": "MD"
    }
   },
   "source": [
    "### List datasets <a class=\"anchor\" id=\"head-5\"></a>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "datalore": {
     "hide_input_from_viewers": false,
     "hide_output_from_viewers": false,
     "type": "CODE"
    },
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "pattern = os.path.join(home, 'shared', 'users', os.environ['USER'], 'datasets', '*', 'metadata.json')\n",
    "\n",
    "datasets = []\n",
    "for metadata_path in glob.glob(pattern):\n",
    "    with open(metadata_path, 'r') as metadata_file:\n",
    "        datasets.append(json.load(metadata_file))\n",
    "\n",
    "print(json.dumps(datasets, indent=2))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "datalore": {
     "hide_input_from_viewers": false,
     "hide_output_from_viewers": false,
     "type": "MD"
    }
   },
   "source": [
    "### Provide and customize dataset convert specs <a class=\"anchor\" id=\"head-6\"></a>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "# Choose dataset convert action\n",
    "if model_name in (\"ssd\", \"retinanet\"):\n",
    "    convert_action = \"convert_and_index\"\n",
    "elif model_name in (\"efficientdet\"):\n",
    "    convert_action = \"convert_efficientdet\"\n",
    "else:\n",
    "    convert_action = \"convert\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "datalore": {
     "hide_input_from_viewers": false,
     "hide_output_from_viewers": false,
     "type": "CODE"
    },
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "# Default train dataset specs\n",
    "! nvtl {model_name} dataset-convert-defaults --id {train_dataset_id} --action {convert_action} | tee ~/shared/users/{os.environ['USER']}/datasets/{train_dataset_id}/specs/{convert_action}.json"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "# Customize train dataset specs\n",
    "specs_path = os.path.join(home, 'shared', 'users', os.environ['USER'], 'datasets', train_dataset_id, 'specs', f'{convert_action}.json')\n",
    "\n",
    "with open(specs_path , \"r\") as specs_file:\n",
    "    specs = json.load(specs_file)\n",
    "\n",
    "# Apply changes\n",
    "if model_name == \"efficientdet\":\n",
    "    specs[\"coco_config\"][\"num_shards\"] = 256\n",
    "    specs[\"coco_config\"][\"tag\"] = \"train\"\n",
    "else:\n",
    "    specs[\"kitti_config\"][\"image_extension\"] = \".jpg\" #Change to png if your entire dataset is of png format\n",
    "\n",
    "if convert_action == \"convert_and_index\":\n",
    "    specs[\"target_class_mapping\"] = [   {\"key\":\"bus\",\"value\":\"bus\"}, # Enter the classes of your dataset here\n",
    "                                        {\"key\":\"person\",\"value\":\"person\"},\n",
    "                                    ]\n",
    "\n",
    "with open(specs_path, \"w\") as specs_file:\n",
    "    json.dump(specs, specs_file, indent=2)\n",
    "\n",
    "print(json.dumps(specs, indent=2))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "datalore": {
     "hide_input_from_viewers": false,
     "hide_output_from_viewers": false,
     "type": "CODE"
    },
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "# Default eval dataset specs\n",
    "! nvtl {model_name} dataset-convert-defaults --id {eval_dataset_id} --action {convert_action} | tee ~/shared/users/{os.environ['USER']}/datasets/{eval_dataset_id}/specs/{convert_action}.json"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "# Customize eval dataset specs\n",
    "specs_path = os.path.join(home, 'shared', 'users', os.environ['USER'], 'datasets', eval_dataset_id, 'specs', f'{convert_action}.json')\n",
    "\n",
    "with open(specs_path , \"r\") as specs_file:\n",
    "    specs = json.load(specs_file)\n",
    "\n",
    "# Apply changes\n",
    "if model_name == \"efficientdet\":\n",
    "    specs[\"coco_config\"][\"num_shards\"] = 256\n",
    "    specs[\"coco_config\"][\"tag\"] = \"val\"\n",
    "else:\n",
    "    specs[\"kitti_config\"][\"image_extension\"] = \".jpg\" #Change to png if your entire dataset is of png format\n",
    "\n",
    "if convert_action == \"convert_and_index\":\n",
    "    specs[\"target_class_mapping\"] = [   {\"key\":\"bus\",\"value\":\"bus\"}, # Enter the classes of your dataset here\n",
    "                                        {\"key\":\"person\",\"value\":\"person\"},\n",
    "                                    ]\n",
    "\n",
    "with open(specs_path, \"w\") as specs_file:\n",
    "    json.dump(specs, specs_file, indent=2)\n",
    "\n",
    "print(json.dumps(specs, indent=2))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "datalore": {
     "hide_input_from_viewers": false,
     "hide_output_from_viewers": false,
     "type": "MD"
    }
   },
   "source": [
    "### Run dataset convert <a class=\"anchor\" id=\"head-7\"></a>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "datalore": {
     "hide_input_from_viewers": false,
     "hide_output_from_viewers": false,
     "type": "CODE"
    },
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "train_convert_job_id = subprocess.getoutput(f\"nvtl {model_name} dataset-convert --id {train_dataset_id}  --action {convert_action} \")\n",
    "print(train_convert_job_id)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "def my_tail(logs_dir, log_file):\n",
    "    %env LOG_FILE={logs_dir}/{log_file}\n",
    "    ! mkdir -p {logs_dir}\n",
    "    ! [ ! -f \"$LOG_FILE\" ] && touch $LOG_FILE && chmod 666 $LOG_FILE\n",
    "    ! tail -f -n +1 $LOG_FILE | while read LINE; do echo \"$LINE\"; [[ \"$LINE\" == \"EOF\" ]] && pkill -P $$ tail; done\n",
    "    \n",
    "# Check status (the file won't exist until the backend Toolkit container is running -- can take several minutes)\n",
    "logs_dir = os.path.join(home, 'shared', 'users', os.environ['USER'], 'datasets', train_dataset_id, 'logs')\n",
    "log_file = f\"{train_convert_job_id}.txt\"\n",
    "\n",
    "my_tail(logs_dir, log_file)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "datalore": {
     "hide_input_from_viewers": false,
     "hide_output_from_viewers": false,
     "type": "CODE"
    },
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "eval_convert_job_id = subprocess.getoutput(f\"nvtl {model_name} dataset-convert --id {eval_dataset_id}  --action {convert_action} \")\n",
    "print(eval_convert_job_id)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "# Check status (the file won't exist until the backend Toolkit container is running -- can take several minutes)\n",
    "logs_dir = os.path.join(home, 'shared', 'users', os.environ['USER'], 'datasets', eval_dataset_id, 'logs')\n",
    "log_file = f\"{eval_convert_job_id}.txt\"\n",
    "\n",
    "my_tail(logs_dir, log_file)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "datalore": {
     "hide_input_from_viewers": false,
     "hide_output_from_viewers": false,
     "type": "MD"
    }
   },
   "source": [
    "### Create a model experiment <a class=\"anchor\" id=\"head-8\"></a>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "datalore": {
     "hide_input_from_viewers": false,
     "hide_output_from_viewers": false,
     "type": "CODE"
    },
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "network_arch = model_name.replace(\"-\",\"_\")\n",
    "model_id = subprocess.getoutput(f\"nvtl {model_name} model-create --network_arch {network_arch} --encryption_key tlt_encode \")\n",
    "print(model_id)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Assign train, eval datasets "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "metadata_path = os.path.join(home, 'shared', 'users', os.environ['USER'], 'models', model_id, 'metadata.json')\n",
    "\n",
    "with open(metadata_path , \"r\") as metadata_file:\n",
    "    metadata = json.load(metadata_file)\n",
    "\n",
    "metadata[\"train_datasets\"] = [train_dataset_id]\n",
    "metadata[\"eval_dataset\"] = eval_dataset_id"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "datalore": {
     "hide_input_from_viewers": false,
     "hide_output_from_viewers": false,
     "type": "MD"
    }
   },
   "source": [
    "### Find pretrained model <a class=\"anchor\" id=\"head-9\"></a>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "# Assigning pretrained models to different yolo versions\n",
    "# print $BASE_URL+\"/model\" to get the details of all pretrained models and make the appropriate changes to this map for experiments like for example \n",
    "# you are changing the number of layers to 34, then you have to make the appropriate change in the pretrained model name\n",
    "# print(base_url+\"/model\")\n",
    "pretrained_map = {\"detectnet_v2\" : \"detectnet_v2:resnet18\",\n",
    "                  \"efficientdet\" : \"pretrained_efficientdet:efficientnet_b0\",\n",
    "                  \"faster_rcnn\" : \"pretrained_object_detection:resnet18\",\n",
    "                  \"retinanet\" : \"pretrained_object_detection:resnet18\",\n",
    "                  \"ssd\" : \"pretrained_object_detection:resnet18\",\n",
    "                  \"yolo_v3\" : \"pretrained_object_detection:resnet18\",\n",
    "                  \"yolo_v4\" : \"pretrained_object_detection:resnet18\",\n",
    "                  \"yolo_v4_tiny\": \"pretrained_object_detection:cspdarknet_tiny\"}"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "datalore": {
     "hide_input_from_viewers": false,
     "hide_output_from_viewers": false,
     "type": "CODE"
    },
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "pattern = os.path.join(home, 'shared', 'users', '*', 'models', '*', 'metadata.json')\n",
    "\n",
    "ptm_id = None\n",
    "for ptm_metadata_path in glob.glob(pattern):\n",
    "  with open(ptm_metadata_path, 'r') as metadata_file:\n",
    "    ptm_metadata = json.load(metadata_file)\n",
    "    ngc_path = ptm_metadata.get(\"ngc_path\")\n",
    "    metadata_network_arch = ptm_metadata.get(\"network_arch\")\n",
    "    if metadata_network_arch == network_arch and ngc_path.endswith(pretrained_map[network_arch]):\n",
    "      ptm_id = ptm_metadata[\"id\"]\n",
    "      break\n",
    "\n",
    "metadata[\"ptm\"] = ptm_id\n",
    "print(ptm_id)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### View hyperparameters that are enabled for AutoML by default"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# View default automl specs enabled\n",
    "! nvtl {model_name} model-automl-defaults --id {model_id} | tee ~/shared/users/{os.environ['USER']}/models/{model_id}/specs/automl_defaults.json"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "datalore": {
     "hide_input_from_viewers": false,
     "hide_output_from_viewers": false,
     "type": "MD"
    }
   },
   "source": [
    "### Set AutoML related configurations <a class=\"anchor\" id=\"head-10\"></a>\n",
    "Refer to these hyper-links to see the parameters supported by each network and add more parameters if necessary in addition to the default automl enabled parameters: [DetectNet_V2](https://docs.nvidia.com/tao/tao-toolkit/text/tao_toolkit_api/api_action_specs.html#id4), \n",
    "[EfficientDet](https://docs.nvidia.com/tao/tao-toolkit/text/tao_toolkit_api/api_action_specs.html#id13), \n",
    "[FasterRCNN](https://docs.nvidia.com/tao/tao-toolkit/text/tao_toolkit_api/api_action_specs.html#id16), \n",
    "[RetinaNet](https://docs.nvidia.com/tao/tao-toolkit/text/tao_toolkit_api/api_action_specs.html#id32), \n",
    "[SSD](https://docs.nvidia.com/tao/tao-toolkit/text/tao_toolkit_api/api_action_specs.html#id38), \n",
    "[YOLO_V3](https://docs.nvidia.com/tao/tao-toolkit/text/tao_toolkit_api/api_action_specs.html#id52), \n",
    "[YOLO_V4](https://docs.nvidia.com/tao/tao-toolkit/text/tao_toolkit_api/api_action_specs.html#id58), \n",
    "[YOLO_V4_Tiny](https://docs.nvidia.com/tao/tao-toolkit/text/tao_toolkit_api/api_action_specs.html#id58)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "datalore": {
     "hide_input_from_viewers": false,
     "hide_output_from_viewers": false,
     "type": "CODE"
    },
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "# Choose automl algorithm between \"bayesian\" and \"hyperband\".\n",
    "automl_algorithm=\"bayesian\" # FIXME7 example: bayesian/hyperband\n",
    "\n",
    "#Don't change this, in future multiple metrics will be supported\n",
    "if model_name == \"efficientdet\":\n",
    "    metric = \"kpi\"\n",
    "else:\n",
    "    metric = \"map\"\n",
    "\n",
    "additional_automl_parameters = [] #Refer to parameter list mentioned in the above links and add any extra parameter in addition to the default enabled ones\n",
    "remove_default_automl_parameters = [] #Remove any hyperparameters that are enabled by default for AutoML\n",
    "\n",
    "metadata[\"automl_algorithm\"] = automl_algorithm\n",
    "metadata[\"automl_enabled\"] = True\n",
    "metadata[\"metric\"] = metric\n",
    "metadata[\"automl_add_hyperparameters\"] = str(additional_automl_parameters)\n",
    "metadata[\"automl_remove_hyperparameters\"] = str(remove_default_automl_parameters)\n",
    "\n",
    "with open(metadata_path, \"w\") as metadata_file:\n",
    "    json.dump(metadata, metadata_file, indent=2)\n",
    "\n",
    "print(json.dumps(metadata, indent=2))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "datalore": {
     "hide_input_from_viewers": false,
     "hide_output_from_viewers": false,
     "type": "MD"
    }
   },
   "source": [
    "### Provide train specs <a class=\"anchor\" id=\"head-11\"></a>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "datalore": {
     "hide_input_from_viewers": false,
     "hide_output_from_viewers": false,
     "type": "CODE"
    },
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "# Default train model specs\n",
    "! nvtl {model_name} model-train-defaults --id {model_id} | tee ~/shared/users/{os.environ['USER']}/models/{model_id}/specs/train.json"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "# Customize train model specs\n",
    "specs_path = os.path.join(home, 'shared', 'users', os.environ['USER'], 'models', model_id, 'specs', 'train.json')\n",
    "\n",
    "with open(specs_path , \"r\") as specs_file:\n",
    "    specs = json.load(specs_file)\n",
    "\n",
    "# Apply changes for any of the parameters listed in the previous cell as required\n",
    "# Example for detectnet_v2 (for each network the parameter key might be different)\n",
    "specs[\"training_config\"][\"num_epochs\"] = 80 # num_epochs is the parameter name for all object detection networks\n",
    "\n",
    "# Change the dimension related specs to the original dimension of the image for better accuracy, the default dimension is kitti dataset's dimension of 1248x384\n",
    "\n",
    "# for efficientdet\n",
    "# specs[\"training_config\"][\"train_batch_size\"] = 8\n",
    "# specs[\"training_config\"][\"num_examples_per_epoch\"] = 1257 # number of images in your dataset/number of gpu's\n",
    "# specs[\"eval_config\"][\"eval_epoch_cycle\"] = 10\n",
    "# specs[\"dataset_config\"][\"num_classes\"] = int(num_classes) # num_classes was computed during kitti_to_coco_conversion\n",
    "\n",
    "\n",
    "if \"image_extension\" in specs[\"dataset_config\"].keys():\n",
    "    specs[\"dataset_config\"][\"image_extension\"] = \"jpg\"\n",
    "\n",
    "with open(specs_path, \"w\") as specs_file:\n",
    "    json.dump(specs, specs_file, indent=2)\n",
    "\n",
    "print(json.dumps(specs, indent=2))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "datalore": {
     "hide_input_from_viewers": false,
     "hide_output_from_viewers": false,
     "type": "MD"
    }
   },
   "source": [
    "### Run AutoML train <a class=\"anchor\" id=\"head-12\"></a>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "datalore": {
     "hide_input_from_viewers": false,
     "hide_output_from_viewers": false,
     "type": "CODE"
    },
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "train_job_id = subprocess.getoutput(f\"nvtl {model_name} model-train --id \" + model_id)\n",
    "print(train_job_id)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true,
    "tags": []
   },
   "outputs": [],
   "source": [
    "# Set poll_automl_stats to True if just want to see what's the time left, how many epochs are remaining etc.\n",
    "# Set poll_automl_stats to False if you want to skip stats and see the training logs instead. Training logs viewing are supported only for bayesian\n",
    "\n",
    "poll_automl_stats = True\n",
    "if poll_automl_stats:\n",
    "    import time\n",
    "    from IPython.display import clear_output\n",
    "    stats_path = os.path.join(home, 'shared', 'users', os.environ['USER'], 'models', model_id, train_job_id, \"automl_metadata.json\")\n",
    "    controller_json_path = os.path.join(home, 'shared', 'users', os.environ['USER'], 'models', model_id, train_job_id, \"controller.json\")\n",
    "    while True:\n",
    "        time.sleep(15)\n",
    "        clear_output(wait=True)\n",
    "        if os.path.exists(stats_path):\n",
    "            try:\n",
    "                with open(stats_path , \"r\") as stats_file:\n",
    "                    stats_dict = json.load(stats_file)\n",
    "                print(json.dumps(stats_dict, indent=2))\n",
    "                if float(stats_dict[\"Number of epochs yet to start\"]) == 0.0:\n",
    "                    break\n",
    "            except (json.JSONDecodeError):\n",
    "                print(\"Stats computed are being written to file. Stats will be visible on screen in a few seconds\")\n",
    "else:\n",
    "    # Print the log file - supported only for bayesian (the file won't exist until the backend Toolkit container is running -- can take several minutes)\n",
    "    if automl_algorithm == \"bayesian\":\n",
    "        logs_dir = os.path.join(home, 'shared', 'users', os.environ['USER'], 'models', model_id)\n",
    "        max_recommendations = metadata.get(\"automl_max_recommendations\",20)\n",
    "        for experiment_num in range(max_recommendations):\n",
    "            log_file = f\"{train_job_id}/experiment_{experiment_num}/log.txt\"\n",
    "            while True:\n",
    "                if os.path.exists(os.path.join(logs_dir, log_file)):\n",
    "                    break\n",
    "            print(f\"\\n\\nViewing experiment {experiment_num}\\n\\n\")\n",
    "            my_tail(logs_dir, log_file)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Get the best model from AutoML <a class=\"anchor\" id=\"head-13\"></a>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# The config and the weights of the best configuration are present at best_model folder\n",
    "# Takes a few seconds to copy the original automl experiment to best_model folder\n",
    "\n",
    "# Training times for different models benchmarked on 1 GPU V100 machine can be found here: https://docs.nvidia.com/tao/tao-toolkit/text/automl/automl.html#results-of-automl-experiments\n",
    "\n",
    "!python3 -m pip install pandas\n",
    "import pandas as pd\n",
    "\n",
    "automl_job_dir = f\"{home}/shared/users/{os.environ['USER']}/models/{model_id}/{train_job_id}\"\n",
    "best_model_path =  f\"{automl_job_dir}/best_model\"\n",
    "\n",
    "while True:\n",
    "    if os.path.exists(best_model_path) and len(os.listdir(best_model_path)) > 0 and os.path.exists(f\"{best_model_path}/controller.json\"):\n",
    "        #List the binary model file\n",
    "        print(\"\\nCheckpoints for the best performing experiment\")\n",
    "        if os.path.exists(best_model_path+\"/weights\") and len(os.listdir(best_model_path+\"/weights\")) > 0:\n",
    "            print(f\"Folder: {best_model_path}/weights\")\n",
    "            print(\"Files:\", os.listdir(best_model_path+\"/weights\"))\n",
    "        else:\n",
    "            print(f\"Folder: {best_model_path}\")\n",
    "            print(\"Files:\", os.listdir(best_model_path))\n",
    "\n",
    "        experiment_artifacts = json.load(open(f\"{best_model_path}/controller.json\",\"r\"))\n",
    "        data_frame = pd.DataFrame(experiment_artifacts)\n",
    "        # Print experiment id/number and the corresponding result\n",
    "        print(\"\\nResults of all experiments\")\n",
    "        with pd.option_context('display.max_rows', None, 'display.max_columns', None, 'display.max_colwidth', None):\n",
    "            print(data_frame[[\"id\",\"result\"]])\n",
    "\n",
    "        print(\"\\nConfig/Spec file for the best performing experiment (recommendation_id.kitti with the maximum result value in the dataframe)\")\n",
    "        # List the recommendation config file of the best performing checkpoint(recommendation_id.kitti with the maximum result value in the dataframe)\n",
    "        !ls {best_model_path}/*.kitti \n",
    "            \n",
    "        break"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "datalore": {
     "hide_input_from_viewers": false,
     "hide_output_from_viewers": false,
     "type": "MD"
    }
   },
   "source": [
    "### Delete experiment <a class=\"anchor\" id=\"head-14\"></a>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "datalore": {
     "hide_input_from_viewers": false,
     "hide_output_from_viewers": false,
     "type": "CODE"
    },
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "! rm -rf ~/shared/users/{os.environ['USER']}/models/{model_id}\n",
    "! echo DONE"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "datalore": {
     "hide_input_from_viewers": false,
     "hide_output_from_viewers": false,
     "type": "MD"
    }
   },
   "source": [
    "### Delete datasets <a class=\"anchor\" id=\"head-15\"></a>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "datalore": {
     "hide_input_from_viewers": false,
     "hide_output_from_viewers": false,
     "type": "CODE"
    },
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "! rm -rf ~/shared/users/{os.environ['USER']}/datasets/{train_dataset_id}\n",
    "! rm -rf ~/shared/users/{os.environ['USER']}/datasets/{eval_dataset_id}\n",
    "! echo DONE"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Unmount shared volume <a class=\"anchor\" id=\"head-16\"></a>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "command = \"umount ~/shared\"\n",
    "! echo {password} | sudo -S -k {command} && echo DONE"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Uninstall TAO Remote Client <a class=\"anchor\" id=\"head-32\"></a>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "! pip3 uninstall -y nvidia-transfer-learning-client"
   ]
  }
 ],
 "metadata": {
  "datalore": {
   "base_environment": "default",
   "computation_mode": "JUPYTER",
   "package_manager": "pip",
   "packages": [],
   "version": 1
  },
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.9.5"
  },
  "toc": {
   "base_numbering": 1,
   "nav_menu": {},
   "number_sections": true,
   "sideBar": true,
   "skip_h1_title": false,
   "title_cell": "Table of Contents",
   "title_sidebar": "Contents",
   "toc_cell": false,
   "toc_position": {},
   "toc_section_display": true,
   "toc_window_display": false
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
